Missing Data and Imputation

Authors

Javier Estrada

Michael Underwood

Elizabeth Subject-Scott

Published

April 10, 2023

Website

Slides

Introduction

Missing Data

Missing data occurs when there are missing values in a dataset. There are many reasons why this occurs. It can be intentional or unintentional and can be classified into the following three categories, otherwise known as missingness mechanisms (Mainzer et al. 2023):

  • Missing completely at random (MCAR) is the probability of missing data being completely independent of any other variables.

  • Missing at random (MAR) is the probability of missing data being related to the observed values.

  • Missing not at random (MNAR) is the probability of missing data being dependent on the missing and observed values.

Figure 1: Graphical Representation of Missingness Mechanisms (Schafer and Graham 2002)

(X are the completely observed variables. Y are the partly missing variables. Z is the component of the cause of missingness unrelated to X and Y. R is the missingness.)

Looking for patterns in the missing data can help us to determine which category they belong. These mechanisms are important in determining how to handle the missing data. MCAR would be the best case scenario but seldom occur. MAR and MNAR are more common.

The problem with ignoring any missing values is that it does not give a true representation of the dataset and can lead to bias when analyzing. This reduces the statistical power of the analysis (van_Ginkel et al. 2020). To enhance the quality of the research, the following should be followed: explicitly acknowledge missing data problems and the conditions under which they occur and employ principled methods to handle the missing data (Dong and Peng 2013).

Methods to Deal with Missing Data

There are three types of methods to deal with missing data, the likelihood and Bayesian method, weighting methods, or imputation methods (Cao et al. 2021). Missing data can also be handled by simply deleting.

  • Likelihood Bayesian method is when information from a previous predictive distribution is combined with evidence obtained in a sample to predict a value. It requires technical coding and advanced statistical knowledge.

  • The weighting method is a traditional approach when weights from available data are used to adjust for non-response in a survey. Inefficiency occurs when there are extreme weights or a need for many weights.

  • The imputation method is when an estimate from the original dataset is used to estimate the missing value. There are two types of imputation: single and multiple.

Deleting missing data

Listwise deletion is when the entire observation is removed from the dataset. Deleting missing data can lead to the loss of important information regarding your dataset and is therefore not recommended. In certain cases, when the amount of missing data is small and the type is MCAR, listwise deletion can be used. There usually won’t be bias but potentially important information may be lost.

T-tests and chi-square tests can be used to assess pairs of predictor variables to determine whether the groups’ means differ significantly. According to (van_Ginkel et al. 2020), if significant, the null hypothesis is rejected, therefore, indicating that the missing values are not randomly scattered throughout the data. This implies that the missing data is MAR or MNAR. Conversely, if nonsignificant, this implies that the data cannot be MAR. This does not eliminate the possibility that it is not MNAR–other information about the population is needed to determine this.

Whenever missing data is categorized as MAR or MNAR, listwise deletion would be wasteful, and the analysis biased. Alternate methods of dealing with the missing data is recommended: either pairwise deletion or imputation.

Pairwise deletion is when only the missing variable of an observation is removed. It allows more data to be analyzed than listwise deletion but limits the ability to make inferences of the total sample. For this reason, it is recommended to use imputation to properly deal with missing data.

Preferred Method to Handle Missing Data

Imputation is the preferred method to handle missing data. It consists of replacing missing data with an estimate obtained from the original, available data. After imputation, there will be a full dataset to analyze. To improve statistical power, the number of imputations created should be at least equal to the percent of missing data (5% equals 5 imputations, 10% equals 10 imputations, 20% equals 20 imputations, etc.) (Pedersen et al. 2017). According to (Wulff and Jeppesen 2017), 3-5 imputations are sufficient, and 10 are more than enough.

Single, or univariate, imputation is when only one estimate is used to replace the missing data. Methods of single imputation include using the mean, the last observation carried forward, and random imputation. The following is a brief explanation of each:

  • Using the mean to replace a missing value is a straight-forward process. The mean of the dataset is calculated, including the missing value. The mean is then multiplied by the number of observations in the study. Next, the known values are subtracted from the product, and this gives an estimate that can be used for any missing values. The problem with this method is that it reduces the variance which leads to a smaller confidence interval.

  • Last Observation Carried Forward (LOCF) is a technique of replacing a missing value in longitudinal studies with a previously observed value (the most recent value is carried forward) (Streiner 2008). The problem with this method is that it assumes that the previous observed value is perpetual when in reality that most likely is not the case.

  • Random imputation is a method of randomly drawing an observation and using that observation for any of the missing values. The problem with this method is that it introduces additional variability.

These single imputation methods are flawed. They often result in underestimation of standard errors or too small p-values (Dong and Peng 2013), which can cause bias in the analysis. Therefore, multiple imputation is the better method because it handles missing data better and provides less biased results.

Multiple, or multivariate, imputation is when various estimates are used to replace the missing data by creating multiple datasets from versions of the original dataset. It can be done by using a regression model, or a sequence of regression models, such as linear, logistic and Poison. A set of m plausible values are generated for each unobserved data point, resulting in M complete data sets (Dong and Peng 2013). The new values are randomly drawn from predictive distributions either through joint modeling (JM, which is not used much anymore) or fully conditional specification (FCS) (Wongkamthong and Akande 2023). It is then analyzed and the results are combined to obtain a single value for the missing data.

The purpose of multiple imputation is to create a pool of imputed data for analysis, but if the pooled results are lacking, then multiple imputation should not be done (Mainzer et al. 2023). Another reason not to use multiple imputation is if there are very few missing values; there may be no benefit in using it. Also worth noting is some statistical analyses software already have built-in features to deal with missing data.

Multiple imputation by chained methods, otherwise known as MICE, is the most common and preferred, method of multiple imputation (Wulff and Jeppesen 2017). It provides a more reliable way to analyze data with missing values. For this reason, this paper will focus on the methodology and application of the MICE process.

Code
#loading packages
library(DiagrammeR)

Figure 2: Flowchart of the MICE-process based on procedures proposed by Rubin (Wulff and Jeppesen 2017)

Code
DiagrammeR::grViz("digraph {

# initiate graph
graph [layout = dot, rankdir = LR, label = 'The MICE-Process\n\n',labelloc = t, fontcolor = DarkSlateBlue, fontsize = 45]

# global node settings
node [shape = rectangle, style = filled, fillcolor = AliceBlue, fontcolor = DarkSlateBlue, fontsize = 35]
bgcolor = none

# label nodes
incomplete [label =  'Incomplete data set']
imputed1 [label = 'Imputed \n data set 1']
estimates1 [label = 'Estimates from \n analysis 1']
rubin [label = 'Rubin rules', shape = diamond]
combined [label = 'Combined results']
imputed2 [label = 'Imputed \n data set 2']
estimates2 [label = 'Estimates from \n analysis 2']
imputedm [label = 'Imputed \n data set m']
estimatesm [label = 'Estimates from \n anaalysis m']


# edge definitions with the node IDs
incomplete -> imputed1 [arrowhead = vee, color = DarkSlateBlue]
imputed1 -> estimates1 [arrowhead = vee, color = DarkSlateBlue]
estimates1 -> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputed2 [arrowhead = vee, color = DarkSlateBlue]
imputed2 -> estimates2 [arrowhead = vee, color = DarkSlateBlue]
estimates2-> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputedm [arrowhead = vee, color = DarkSlateBlue]
imputedm -> estimatesm [arrowhead = vee, color = DarkSlateBlue]
estimatesm -> rubin [arrowhead = vee, color = DarkSlateBlue]
rubin -> combined [arrowhead = vee, color = DarkSlateBlue]
}")

*Rubin’s Rules: Average the estimates across m estimates. Calculate the standard errors and variance of m estimates. Combine using an adjustment term (1+1/m).

Other Methods of Imputation

There are other methods of imputation worth noting and are briefly descrbied below.

Regression Imputation is based on a linear regression model. Missing data is randomly drawn from a conditional distribution when variables are continuous and from a logistic regression model when they are categorical (van_Ginkel et al. 2020).

Predictive Mean Matching is also based on a linear regression model. The approach is the same as regression imputation when it comes to categorical missing values but different for continuous variables. Instead of random draws from a conditional distribution, missing values are based on predicted values of the outcome variable (van_Ginkel et al. 2020).

Hot Deck (HD) imputation is when a missing value is replaced by an observed response of a similar unit, also known as the donor. It can be either random or deterministic (based on a metric or value) (Thongsri and Samart 2022). It does not rely on model fitting.

Stochastic Regression (SR) Imputation is an extension of regression imputation. The process is the same but a residual term from the normal distribution of the regression of the predictor outcome is added to the imputed value (Thongsri and Samart 2022). This maintains the variability of the data.

Random Forest (RF) Imputation is based on machine learning algorithms. Missing values are first replaced with the mean or mode of that particular variable and then the dataset is split into a training set and a prediction set (Thongsri and Samart 2022). The missing values are then replaced with predictions from these sets. This type of imputation can be used on continuous or categorical variables with complex interactions.

Methodology

Multiple Imputation by Chained Equations (MICE)

In multiple imputation, m imputed values are created for each of the missing data and result in M complete datasets. For each of the M datasets, an estimate of \(\theta\) is acquired.

Combined estimator of \(\theta\) is given by:

\({\hat{\theta}}_{M}\)=\(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M} {\hat{\theta}}_{m}\)

The proposed variance estimator of \({\hat{\theta}}_{M}\) is given by:

\({\hat{\Phi}}_{M}\) = \({\overline{\phi}}_{M}\)+(1+\(\displaystyle \frac{1}{M}\))B\(_{M}\)

where \({\overline{\phi}}_{M}\) = \(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M}\)\({\hat{\phi}}_m\)

and B\(_{M}\) = \(\displaystyle \frac{1}{M-1}\)\(\sum_{m = 1}^{M}\)(\({\hat{\theta}}_{m}\)-\({\overline{\theta}}_{M}\))\(^{2}\)

(Arnab 2017)

The chained equation process has the following steps (Azur et al. 2011):

Step 1:

Using simple imputation, replace the missing data with this value, referred to as the “place holder”.

Step 2:

The “place holder” values for one variable are set back to missing.

Step 3:

The observed values from this variable (dependent variable) are regressed on the other variables (independent variables) in the model, using the same assumptions when performing linear, logistic, or Poison regression.

Step 4:

The missing values are replaced with predictions “m” from this newly created model.

Step 5:

Repeat Steps 2-4 for each variable that have missing values until all missing values have been replaced.

Step 6:

Repeat Steps 2-4, updating imputations each cycle for as many “m” cycles/imputations that are required.

Analysis and Results

Data and Visualizations

Load Data and Packages
Code
# load data
credit = read.csv("credit_data.csv")

# load libraries
library(gtsummary)
library(dplyr, warn.conflicts=FALSE)
library(mice, warn.conflicts=FALSE)
Description of Dataset

Credit score data

Details of Dataset

The credit.csv file is from the website of Dr. Lluís A. Belanche Muñoz, by way of a github repository of Dr. Gaston Sanchez. It contains data of 4,454 subjects and stores a combination of continuous, categorical and count values for 15 variables. Of the 15 variables, the “Status” variable contains binomial categorical values of “good” and “bad” to describe the kind of credit score each subject has. One data point is missing an outcome and was removed from the original data.

Definition of Data in Dataset
Variable Type Description
X Integer Count variable indicating the number of subjects.
Status Character 2-level categorical variable indicating the status of the subject’s credit: good or bad.
Seniority Integer Count variable indicating the seniority a subject has accumulated over the course of their life.
Home Character 6-level categorical variable indicating the subject’s relationship to their residential address: rent, owner, parents, priv, other, or ignore.
Time Integer Count variable showing how many months has elapsed since the subject’s payment deadline without paying their debt full.
Age Integer Count variable indicating subject’s age (in years).
Marital Character 5-level categorical variable indicating the subject’s marital status: single, married, separated, divorced, or widow.
Records Character 2-level categorical variable indicating whether the subject has a credit history record: yes or no.
Job Character 4-level categorical variable indicating the type of job the subject has: fixed, freelance, partime, or others.
Expenses Integer Count variable indicating the amount of expenses (in USD) a subject has.
Income Integer Count variable indicating the amount of income (in thousands of USD) a subject earns annually.
Assets Integer Count variable indicating the amount of assets (in USD) a subject has.
Debt Integer Count variable indicating the amount of debt (in USD) a subject has.
Amount Integer Count variable indicating the amount of money (in USD) remaining in a subject’s bank account.
Price Integer Count variable indicating the amount of money a subject earns by the end of the month.
Summary of Dataset:
Code
credit %>%
  tbl_summary(by = Status,
              missing_text = "NA") %>%
  add_p() %>%
  add_n() %>%
  add_overall %>%
  modify_header(label ~ "**Variable**") %>%
  modify_caption("**Summary of Credit Data**") %>%
  bold_labels()
Summary of Credit Data
Variable N Overall, N = 4,4541 bad, N = 1,2541 good, N = 3,2001 p-value2
X 4,454 2,228 (1,114, 3,341) 2,222 (1,142, 3,366) 2,232 (1,098, 3,326) 0.3
Seniority 4,454 5 (2, 12) 2 (1, 6) 7 (2, 14) <0.001
Home 4,448 <0.001
    ignore 20 (0.4%) 9 (0.7%) 11 (0.3%)
    other 319 (7.2%) 146 (12%) 173 (5.4%)
    owner 2,107 (47%) 390 (31%) 1,717 (54%)
    parents 783 (18%) 233 (19%) 550 (17%)
    priv 246 (5.5%) 84 (6.7%) 162 (5.1%)
    rent 973 (22%) 388 (31%) 585 (18%)
    NA 6 4 2
Time 4,454 48 (36, 60) 48 (36, 60) 48 (36, 60) <0.001
Age 4,454 36 (28, 45) 34 (27, 42) 36 (28, 46) <0.001
Marital 4,453 <0.001
    divorced 38 (0.9%) 14 (1.1%) 24 (0.8%)
    married 3,241 (73%) 829 (66%) 2,412 (75%)
    separated 130 (2.9%) 64 (5.1%) 66 (2.1%)
    single 977 (22%) 328 (26%) 649 (20%)
    widow 67 (1.5%) 19 (1.5%) 48 (1.5%)
    NA 1 0 1
Records 4,454 773 (17%) 429 (34%) 344 (11%) <0.001
Job 4,452 <0.001
    fixed 2,805 (63%) 580 (46%) 2,225 (70%)
    freelance 1,024 (23%) 333 (27%) 691 (22%)
    others 171 (3.8%) 68 (5.4%) 103 (3.2%)
    partime 452 (10%) 271 (22%) 181 (5.7%)
    NA 2 2 0
Expenses 4,454 51 (35, 72) 49 (35, 75) 52 (35, 68) 0.8
Income 4,073 125 (90, 170) 100 (74, 148) 130 (100, 178) <0.001
    NA 381 217 164
Assets 4,407 3,000 (0, 6,000) 0 (0, 4,000) 4,000 (0, 7,000) <0.001
    NA 47 20 27
Debt 4,436 0 (0, 0) 0 (0, 0) 0 (0, 0) 0.3
    NA 18 13 5
Amount 4,454 1,000 (700, 1,300) 1,100 (800, 1,415) 1,000 (700, 1,250) <0.001
Price 4,454 1,400 (1,117, 1,692) 1,423 (1,062, 1,728) 1,400 (1,134, 1,678) >0.9
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Pearson's Chi-squared test
Evaluate Dataset

First, we evaluate the dataset for missing values. As indicated in the table, the data does contain NA/missing values. We can create a table that shows each variable and how many missing values they have:

Code
# Shows which variables have missing values and how many
colSums(is.na(credit))
        X    Status Seniority      Home      Time       Age   Marital   Records 
        0         0         0         6         0         0         1         0 
      Job  Expenses    Income    Assets      Debt    Amount     Price 
        2         0       381        47        18         0         0 

We now must analyze the data to see how we intend to handle the missing values. In order to do this, we need to create a new dataset, called new_credit, that deletes the missing data. We want to perserve the original dataset so we can implement the method we intend to use to address the missing values. We can then generate a count of rows to determine how many values were deleted in total.

Code
# Creates a new dataset excluding missing values 
new_credit = na.omit(credit)

# Number of rows of new dataset
nrow(new_credit)
[1] 4039

We started out with 4,454 rows and our new dataset has 4,039. 415 rows were deleted due to the missing data. To run regression, we would be throwing away 9.3% of our data, because of missingness. Instead, we can use multiple imputation to impute the missing values so that we don’t have to discard such valuable information.

MICE in R

Using the MICE (Multivariate Imputation by Chained Equations) package in R, a statistical programming software, we will create multiple datasets with imputed values for the missing values. Because our dataset contains just under 10% of missing data, we will generate 10 imputations, or 10 new datasets. The MICE package seamlessly does this by creating plausable values from other columns and places them into the intersections of rows and columns with missing data.

First step is to check the missingness by looking for patterns in the original dataset using the md.pattern() function:

Code
credit <- credit[-c(1)]
md.pattern(credit, rotate.names = TRUE)

     Status Seniority Time Age Records Expenses Amount Price Marital Job Home
4039      1         1    1   1       1        1      1     1       1   1    1
366       1         1    1   1       1        1      1     1       1   1    1
22        1         1    1   1       1        1      1     1       1   1    1
7         1         1    1   1       1        1      1     1       1   1    1
8         1         1    1   1       1        1      1     1       1   1    1
4         1         1    1   1       1        1      1     1       1   1    1
3         1         1    1   1       1        1      1     1       1   1    0
2         1         1    1   1       1        1      1     1       1   1    0
1         1         1    1   1       1        1      1     1       1   0    1
1         1         1    1   1       1        1      1     1       1   0    0
1         1         1    1   1       1        1      1     1       0   1    1
          0         0    0   0       0        0      0     0       1   2    6
     Debt Assets Income    
4039    1      1      1   0
366     1      1      0   1
22      1      0      1   1
7       1      0      0   2
8       0      0      1   2
4       0      0      0   3
3       0      0      1   3
2       0      0      0   4
1       1      1      0   2
1       0      0      0   5
1       1      1      1   1
       18     47    381 455

Blue is observed values and red is missing values. There are 11 patterns.

In order to perform multiple imputation on categorical data, all string variables must be converted to factors using the as.factor() function (van_Buuren 2011):

Code
credit$Status = as.factor(credit$Status)
credit$Home = as.factor(credit$Home)
credit$Marital = as.factor(credit$Marital)
credit$Records = as.factor(credit$Records)
credit$Job = as.factor(credit$Job)

Using the mice() function, 10 multiple imputations for the missing values will be generated. The default is 5, so you must set m = to the number of imputations that you desire. Since the data type of the variables in the dataset are of both numerical and categorical nature (with 2 and more levels), the defaultMethod argument will contain pmm: predictive mean matching (numeric data); logreg: logistic regression imputation (binary data, factor with 2 levels); polyreg: polytomous regression imputation for unordered categorical data (factor > 2 levels); polr: proportional odds model for (ordered, > 2 levels). The set.seed will be given the value 1337 (any number can be used here) to retrieve the same results each time the multiple imputation is performed.

Code
Multiple_Imputation = mice(data = credit,  maxit = 10, m = 10, defaultMethod = c("pmm", "logreg", "polyreg", "polr"), set.seed = 1337)

 iter imp variable
  1   1  Home  Marital  Job  Income  Assets  Debt
  1   2  Home  Marital  Job  Income  Assets  Debt
  1   3  Home  Marital  Job  Income  Assets  Debt
  1   4  Home  Marital  Job  Income  Assets  Debt
  1   5  Home  Marital  Job  Income  Assets  Debt
  1   6  Home  Marital  Job  Income  Assets  Debt
  1   7  Home  Marital  Job  Income  Assets  Debt
  1   8  Home  Marital  Job  Income  Assets  Debt
  1   9  Home  Marital  Job  Income  Assets  Debt
  1   10  Home  Marital  Job  Income  Assets  Debt
  2   1  Home  Marital  Job  Income  Assets  Debt
  2   2  Home  Marital  Job  Income  Assets  Debt
  2   3  Home  Marital  Job  Income  Assets  Debt
  2   4  Home  Marital  Job  Income  Assets  Debt
  2   5  Home  Marital  Job  Income  Assets  Debt
  2   6  Home  Marital  Job  Income  Assets  Debt
  2   7  Home  Marital  Job  Income  Assets  Debt
  2   8  Home  Marital  Job  Income  Assets  Debt
  2   9  Home  Marital  Job  Income  Assets  Debt
  2   10  Home  Marital  Job  Income  Assets  Debt
  3   1  Home  Marital  Job  Income  Assets  Debt
  3   2  Home  Marital  Job  Income  Assets  Debt
  3   3  Home  Marital  Job  Income  Assets  Debt
  3   4  Home  Marital  Job  Income  Assets  Debt
  3   5  Home  Marital  Job  Income  Assets  Debt
  3   6  Home  Marital  Job  Income  Assets  Debt
  3   7  Home  Marital  Job  Income  Assets  Debt
  3   8  Home  Marital  Job  Income  Assets  Debt
  3   9  Home  Marital  Job  Income  Assets  Debt
  3   10  Home  Marital  Job  Income  Assets  Debt
  4   1  Home  Marital  Job  Income  Assets  Debt
  4   2  Home  Marital  Job  Income  Assets  Debt
  4   3  Home  Marital  Job  Income  Assets  Debt
  4   4  Home  Marital  Job  Income  Assets  Debt
  4   5  Home  Marital  Job  Income  Assets  Debt
  4   6  Home  Marital  Job  Income  Assets  Debt
  4   7  Home  Marital  Job  Income  Assets  Debt
  4   8  Home  Marital  Job  Income  Assets  Debt
  4   9  Home  Marital  Job  Income  Assets  Debt
  4   10  Home  Marital  Job  Income  Assets  Debt
  5   1  Home  Marital  Job  Income  Assets  Debt
  5   2  Home  Marital  Job  Income  Assets  Debt
  5   3  Home  Marital  Job  Income  Assets  Debt
  5   4  Home  Marital  Job  Income  Assets  Debt
  5   5  Home  Marital  Job  Income  Assets  Debt
  5   6  Home  Marital  Job  Income  Assets  Debt
  5   7  Home  Marital  Job  Income  Assets  Debt
  5   8  Home  Marital  Job  Income  Assets  Debt
  5   9  Home  Marital  Job  Income  Assets  Debt
  5   10  Home  Marital  Job  Income  Assets  Debt
  6   1  Home  Marital  Job  Income  Assets  Debt
  6   2  Home  Marital  Job  Income  Assets  Debt
  6   3  Home  Marital  Job  Income  Assets  Debt
  6   4  Home  Marital  Job  Income  Assets  Debt
  6   5  Home  Marital  Job  Income  Assets  Debt
  6   6  Home  Marital  Job  Income  Assets  Debt
  6   7  Home  Marital  Job  Income  Assets  Debt
  6   8  Home  Marital  Job  Income  Assets  Debt
  6   9  Home  Marital  Job  Income  Assets  Debt
  6   10  Home  Marital  Job  Income  Assets  Debt
  7   1  Home  Marital  Job  Income  Assets  Debt
  7   2  Home  Marital  Job  Income  Assets  Debt
  7   3  Home  Marital  Job  Income  Assets  Debt
  7   4  Home  Marital  Job  Income  Assets  Debt
  7   5  Home  Marital  Job  Income  Assets  Debt
  7   6  Home  Marital  Job  Income  Assets  Debt
  7   7  Home  Marital  Job  Income  Assets  Debt
  7   8  Home  Marital  Job  Income  Assets  Debt
  7   9  Home  Marital  Job  Income  Assets  Debt
  7   10  Home  Marital  Job  Income  Assets  Debt
  8   1  Home  Marital  Job  Income  Assets  Debt
  8   2  Home  Marital  Job  Income  Assets  Debt
  8   3  Home  Marital  Job  Income  Assets  Debt
  8   4  Home  Marital  Job  Income  Assets  Debt
  8   5  Home  Marital  Job  Income  Assets  Debt
  8   6  Home  Marital  Job  Income  Assets  Debt
  8   7  Home  Marital  Job  Income  Assets  Debt
  8   8  Home  Marital  Job  Income  Assets  Debt
  8   9  Home  Marital  Job  Income  Assets  Debt
  8   10  Home  Marital  Job  Income  Assets  Debt
  9   1  Home  Marital  Job  Income  Assets  Debt
  9   2  Home  Marital  Job  Income  Assets  Debt
  9   3  Home  Marital  Job  Income  Assets  Debt
  9   4  Home  Marital  Job  Income  Assets  Debt
  9   5  Home  Marital  Job  Income  Assets  Debt
  9   6  Home  Marital  Job  Income  Assets  Debt
  9   7  Home  Marital  Job  Income  Assets  Debt
  9   8  Home  Marital  Job  Income  Assets  Debt
  9   9  Home  Marital  Job  Income  Assets  Debt
  9   10  Home  Marital  Job  Income  Assets  Debt
  10   1  Home  Marital  Job  Income  Assets  Debt
  10   2  Home  Marital  Job  Income  Assets  Debt
  10   3  Home  Marital  Job  Income  Assets  Debt
  10   4  Home  Marital  Job  Income  Assets  Debt
  10   5  Home  Marital  Job  Income  Assets  Debt
  10   6  Home  Marital  Job  Income  Assets  Debt
  10   7  Home  Marital  Job  Income  Assets  Debt
  10   8  Home  Marital  Job  Income  Assets  Debt
  10   9  Home  Marital  Job  Income  Assets  Debt
  10   10  Home  Marital  Job  Income  Assets  Debt

The following R code will show the imputed values. Columns are imputations, rows are observations.

Code
head(Multiple_Imputation$imp, 10)
$Status
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Seniority
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Home
           1       2       3       4       5       6       7       8       9
30   parents   owner   owner    rent parents    rent parents parents   owner
240    owner parents   owner    rent   owner   owner   owner   other parents
1060   owner   owner parents    rent   other parents parents    rent parents
1677   owner    rent   owner   other   owner   other   owner    priv    rent
2389   other parents   owner parents    priv   other   owner parents    priv
2996   owner   owner   owner   other   other   other    rent    rent   owner
          10
30     other
240    owner
1060 parents
1677   owner
2389    priv
2996   owner

$Time
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Age
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Marital
           1       2      3       4     5       6      7      8     9      10
3319 married married single married widow married single single widow married

$Records
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Job
            1       2       3       4         5       6     7         8       9
30  freelance partime partime   fixed freelance   fixed fixed freelance partime
912   partime partime partime partime     fixed partime fixed     fixed partime
           10
30  freelance
912     fixed

$Expenses
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Income
       1   2   3   4   5   6   7   8   9  10
30   100 160 125  73  69 103  96 165 116  93
114   80  53  70 145 105 120  92  92 150 109
144  300 100 185 250  90 250 315 254 230 373
153  130 152  40 224 142 130  76  70  60 102
158  150 195 100  98  90 140 160 122 265 176
177  250 245 250 241 142 254 241 230 250 321
195  120 180  98 162 186  78  72 102 150 144
206  115 200 203 108 197  70 190 177 100  70
241  142  85 212 243 109 208 130 152 214 155
242  150 144 115 120 538 212  91 150 126 169
278   60 130 115 100  79 118  78 145  96 170
318   80 150  59  57 107  90 120 142 200  80
330  156 125 159 128 198 120 215 100 120  24
333  176 211 150 145 149 160 113 152 162 114
335  168 113 132 109 180 318  99  51  75 130
356  100 176 105 113 170 188 100 135 143 156
360  113  80  95 164 160 140  69  83 124  80
394  500 500 350 500 905 491 150 500 350 500
404  130 150 251 107 121  96  70 280  71 182
422   71 174  92 225 154 330 150 135  58 174
439  115 103 160 136  60  88  98 168 189 145
444  275 210 136 153  80  78 242  63  80 155
462  156 210 158 120 244  84 103 100  92 102
469   93 167  95 104  92 127 120  55 139  86
479   84  50  84 169  89 140  93  75 178 115
481  200 148 205 207 165  95 154 340 220 140
483   90 124 134 120 143 120 198  80 175 145
485   80 225  85 191 254  98  82 118 104 129
496   66  93  90  58 125  50  73  70  80 221
498  126 312 101  86  80 102 105  80 177  60
505   86  73 140 109  60  40  56 120 112 135
567  114  79 110  42  95 205 200  50 133 114
572   73  80  92  42 150 148  90 218 150  48
582   63  33  40  67 121  85 108  75  46  33
648  430 500 373 230 292 183  50 191 250 171
653   86 108 120 130 120 200 140  88  98  74
667  230 183 230 183 800 200 183 416 905  91
675  400 300 208 466 125  87  69 208 185 300
678  315  56 145 105 178 139 126 155 200 190
699  157  85  65 120 135 160 189 170 190 235
708  100 285 125  50 189 200  81 160 194 100
714   70  91  80 163  45 125   8 115  81  92
716  202 135 139  83  70 120  92  75  42  45
733  125 110  40  85 300 150 221 147 118 243
734  165 100  70 112  95 130  80 118  88 160
746  400 129 121 100  87 175  81 100 175 123
777   80 106 120 175 139  80  57 115 112  67
781   46 160  84 122 103  85  87  70  69 115
785  111 200 138 101 100 466 250 126 200 313
804  118 135 191  71 117 247 140 161  93  75
824   92 123 117  41  95 127 178  67  50  79
865  114  60 125  60  71 114 175  81  50 110
866  296 136 195 200 115 101 140  60 130 115
880  132 100 150 138 109 189  95 167 212 115
889  145 359 204  90 100 113 192  50 120 154
906   92 380 428 275 207 535 190 120  99 200
912   92 102  78  74  99 131 108  80  74 285
942   90 135 120  65  90 115  53 104  63  60
952  170 120 125 128 105 188 130 145 139 250
989   70  80  70 180 140 160 350 117 120  80
1001  82 110  63  71  40 105 115  50  85  96
1017 125 130 130 130 184  80 192  80 120 113
1039 128 190  53  61 185 350 250 128  90 100
1044  99  70  50  60 103  35  83  57  90  47
1069 138 250 173 225  70 100 183  63 110 240
1100 102  86  92 335  60  47  30 107  53  77
1111  65  25 100 175 111  80 111  72  78  45
1125 123 187  86 240  42 120 100 180 105 111
1168 246 147 155 606 218 201 107 207 235 500
1208 225 384 200  69 120 158 160 464 275  58
1226 220 180  51 188 335 170 300 382 150 219
1250 230 128 142 200 128 142  65 130  85 175
1257 197 185  45 217 120 298 214 240 260 150
1276  57 142 166  80 200 177 123 196 114 200
1281  95  74  70 117 105 110 300  92  57  70
1289  90 192  90 149  88  73  33  93  90  38
1297 120 198 189  77 127 200 146 218 116 115
1307 400  86 350 200 200 107  80 115 130 110
1314 120  84 164 161 400  70 100 131 340  64
1335 240  85  67  92 206 119 100 256 100 184
1364 275 229 606 146 150 500 144 289  50 120
1365 250 300 166 411 200  67 145 154 555 250
1366 466 140 152 205 125 154 250 200 250  95
1392 150 150 500 500 905 491 150 150 491 337
1421 200 153 210 350 243 125 161 106 131  54
1427 110 193 164 122 130 110 134 419 208 130
1433  90  44 124 113  53  50  70  95  60  60
1436 120 122 130 111 218 178 120 179 115 364
1437 107  81 176  94 113 110 135  80 194 140
1441  90  65 187 100 450  74  92 140  73 123
1456  93  92 140 140 176 204  90  90 227  75
1473 275 229 118  70 166 155 187 384 154 143
1509 128 110 135 190 142 137 160  42 159 195
1513  65  81 125 127  70  84  60  66  60  76
1530  56 140 130 161 135  82 129  80  90  60
1535 160 128 131 160 169 198  50 115  59 126
1536 180 120 250 250 350 300 231 288 100 220
1544 100 125 156  90  56  63 230  88  95  80
1549  90  70 125 142 126 106 180 100 175  77
1564  50 208 202 135  85 172  90 116 180 100
1580 114 110 130  67 103 113  74  45 111  60
1583 156 125 147 200  67  28  90 198  75  60
1598  39 251 250 146 200  55 150 135 200 211
1599 113 140 145 150 107 350  81 133  35 134
1619  85  70 218 100  90 143 349 126 176  55
1629 136 103 197 106  81 152 103 180 145 137
1648  58  82 140 106 115  52  75  45  90  61
1662 175 148  80  99  65 100  88 190 174 137
1677 179 225 155  42 200 113 200 103  87 257
1685 250  60  87 110 260 400  79 210 241 142
1722 185 211  72 153 216 426  87 250 135 146
1724  87  70 150  59 300 175 145  52 104 100
1733 174  92 100 200 289 113 247 110 394 100
1741  60  78 137  55  93  63  67 106  80  70
1745 150 100 120 149 147  80 198 150 242 190
1753 101  48  58 150  90  73 140  45  30  70
1762 148 117  85  91 140 132  27  63  48 198
1766 126 300 300  92 150 160 217 200 145  79
1771 148  60 120 500 214 413 255 100 105  35
1798 150 135  97  72 135 150 180  91 100  78
1802 150 491 500 500 200 491 500 150 500 500
1803 170 100 200  92 140  97 130 156 232 150
1807 114  75 110 192  28 155  80 126 121 225
1811  89  99 113 110 157  42  71  80 119  72
1844  86  80 138 191 251  31 121 100 234 199
1851 250 250 151 260 430 250 166 151 430 250
1852 194  70 283 120 127 107  83  72 100 198
1870 214  97  89  74  75  95 167  77 180 152
1872  57  77  70 116 350 138 200  88  79 220
1882  70  77  92 136  76 139  52  90  42  50
1883 200 160  83 216 125 193 203  88 179 145
1893 500 350 200 491 183 241 241 905 150 500
1898 185  80  75 118  70  91 159 101 235  81
1903  86 120 165  65  91 219  72  55 120  65
1907 160 205  83 300 236  96 382 394 148 195
1920  48 100  80  92 116 120  94 130  25  63
1936 199  85 230 179  61 355  65 157 168 301
1946 106 100 125 144  64  93  79 103  95 105
1948 169 107  88  80 142 110 128  92  70  66
1962 127  87 137 135 140 120  60  97 117  90
1963 314  75  75 220  51 148 210 260 158 140
1965  85 122 115 148 500  95 116 144 111  87
1970 250 189 151 230 260 250 250 300 250 100
1972 150 500 200 500 150 150 905 491 500 500
1977  50 102  45 248 200  90 160 170 200 245
1979  82  78 211 162 130 113 150 200 177 157
1980  88  19  90  48  63  72  60 110 120  72
1984 200 247  72  60 223 142 186 160  66 170
2006  90 120  75 150 121 140 124 160  92 112
2016 240 107 100 225 168 162 100  52 104 142
2022 137 148 115 117 107 141 110 110 121  87
2025 108 240 188  68 335 250 133 212 250 150
2042 242 150  60 205  80 200 214  98 130  63
2043 120 167  96  35 183  80  58 180 140 120
2076 110 168 180 147  88  95 100 150 180  70
2077 172 190 110 187 110 180 177 210 201 285
2083 116  50 110 133 198  95 117 245  61 140
2156 173 315 138 200  95 175 160 161 160 225
2157 176 100 140 110 184  77  51 107  98 120
2186  81 107  83  70  92 160 364  67  77  75
2197 121 100  75  67  92  56  84  60 172 170
2205 235 236 145  79  79 155 190 296 191 110
2218 200 115  55  80 104 126 160 125  91 184
2227 130 120 102  60  85 130  96  35  66  96
2233 295 146 280 170 150  70 400 154 168 102
2240 207 289 154 340 275 157 301  80 120  19
2257 142  88 140 200 117  63 199 111 132 120
2280 151 700 144 350 144 250 144 171 186 275
2291 130 382 211  41 301 250 100 135 285 107
2297  65  70  30  46  63  69 161 120  75  66
2304  88  75  90  46  46 135  51  53  49  49
2310 233 175 270 373 300 250 125 195 257 100
2323 135  50  60  56  78  92  42  60  49 128
2331 905 491 241 500 200 200 200 491 150 337
2337 192  55  60 182  75 125  72  70  87  40
2349 248 137 200 214 173  97 214 200 264 114
2365 700 150 380 260 188 105  79 131 128 320
2369 165 120 201 200 158 260 120 196 170 110
2387 148 100  92 141 130  60  65  60  45  60
2396 130 150 140  95  90 140  60 115  25  74
2399 470  96  67 115  72  86 180 240 107 125
2402 115 154 220  42 428 300 289 223 190 380
2404 160 200 126  65 165 150 150  81  96 189
2437 166 250 189 115 191 315 250  60 400 195
2445 159 100 174 128  40 137 152  96 125  87
2446 285 110 150 108 250  69 242 154  65 105
2453 148 185  21  60 125  65  83 105 122  77
2460  95  81  66 171 149 135  58  40 152 182
2467  70  65 125  52 110 140  85 100  72 119
2473 108  65 113 146 104 148 140  56 140  95
2490  78  88 102  82  23 100 119  96  85  37
2495  60  92  82  63  71 139  70  50 161  66
2505  67 157 145 135 226 164 138 217 152 190
2566 160  40 120 114 220 123  90 199 160 225
2572 125 120 225 110 100  53 110  90  78  80
2578  76  50 107 160 235  82  87 150 107 142
2584  63 130 116  74  78 131  51  45 169 129
2596 175  55  78  76 102 178  42 102  81  56
2605 102  94 146 140  75  80  71 102 130 100
2614 100 210 130 170 250 172 150 242 189 222
2624 144  80  70 130 140  79  96 144  95  74
2625  85 170 198 162 135  90 120 200  90 300
2631 200  95  90 112 165 118 130 150  50 230
2632 295 110 197 233 178  71 130 116  92 100
2651 228 100 150 144 147  57 120  55 150  62
2652  35 210  42 118  70  75 147  77 180  79
2653  91 271 230 148 214 156  52 170 150 100
2668 100 155 195  76 265 111 125  80 192 189
2676 216  80 150 115 203 114  50 210 157 173
2681  65 122 142  92  60 120 200 106  91 140
2683 105 117  90 199  80 122 140 108 155 120
2695 141 121 114  93 196 200 107 155 135 130
2696 175  50  75  92  65  86  80 110  83  27
2707 167 218 130  67 127 240 105 122 129 170
2720 120 120 190  62 107 112 190 250  25 177
2723  90  86  74  56  55 130  47  82  60 100
2725  69  50 100  65 345  93  87 114 359  82
2730 150  84 212 180 150 159 100  88 201  50
2769 110  75 120 130 166 107  98  72  76 133
2780  77  60 120  93  68  50  85  80  68  86
2781 150 173  99 245 300 535 128 229 171 179
2802 215 180  98  84 166  92 242  91 120 170
2805 242 126 161 183 164 150 155  70  70 145
2806  82 106  45  70  69 200  54 158  85  80
2807 181 120 233 199 137 130  99 120  70 100
2810  88 110 250 167 163 139 220 115 155 120
2813 173 150 133 190 100  75 150  69  71 200
2815 100  76 115 119 163 101 117 100  62  46
2825 340 103 200  70 230 178 144 125  83 117
2854 300 135 114 128 300  96  68 250  79 442
2869 200 145 175 120 165 150 210 126  92 115
2882  86  90  75  65  47  53  55  78  53  52
2884 200  51  80 130  65 220 160 100  99  62
2893 450 130 115 105 104 110 198 129 199  80
2915  85 119 180 123  75 270 211 136 140  60
2927  53  70 100 156  64 135 155 105 136  69
2935 178 170 285 127  78  86  88  70  60 140
2936 189  72 191 300 128 695 150 218 162 190
2939 120  79  64  71  82 128 173 194 113 135
2951 250 178 200 416 171 178 416 245 491 337
2954 126 125 250 800 275  50 231 959 186 250
2969 130 120 250 206  80 176 500 270 136 154
2971 155 340  72 155 160 200 182 175 172 170
2979 300 240 180  94 531 143  94  94 300 300
2983 218 133  85 125 126  95  80 217 150 125
2991 105 149 120 127 250  92  49 121 150 182
2996 112  72 214  72 128 132 114 160 150 167
2999 217 428 251 100 380 350 143 144 107 700
3008 250 531 321 144 150 191 250 220 700 700
3014 101 181 160 100  76 350 135  57  78 128
3021 250 110  97 150 180 126 119 100 150 110
3026 110  64 415  80 216 148 117 109 100 180
3031 120  25  70  55 110  83  73  80 136 209
3038  85  49  31  49  67  48  33  33 121  53
3040 298 217 150 217 120 250 165 606 150 300
3069 189  54 210 215 200 100 250 173 125  75
3080 133  28  90  80  67  67  85 100  78 140
3096 142 113 195 115 160  60  88  75 135 300
3104  47  54  95 110 115  54  60  83  72 130
3106 101  93 149  70 140 135 200 162 251  77
3110 190 400 126 245 187 236  50 100 150 128
3121 138 101 383 126 450 101 111  69  50 101
3123 150 133  60 217  54 112 140 100 162 169
3139 300 257 459 100 270 314 100 250 364 257
3167  77 125  80 145  72 170 120  60 115  60
3170 160  86 190 112  85 107 152 117 230 160
3183  80 170 240  75 250 144 180 318 100  71
3185 110 190  80 165  77 168 100 110 200 178
3187 184  70 128 135  60 202 153 185  71  69
3203 144  92 219 195  74 160  80  71 117 154
3218 102  65  39  90 124 105 124  75  80 245
3222 188 212  95 110 198 172 280 149 180 106
3229  66 137 136 150 324 125 110 159  89 107
3233  92 180 129 400 400 230 606 118 442 100
3237 124 220 189 187 128 169 150 101 100 103
3245 145 120  55  85 119 119 139 125 233 120
3252 100  62  56  63 120 122 180  93  85 115
3266 134 160  54 170 117 195 105 145 110 133
3286 140 327  78 110 118  75  75 126 110 125
3288 160 106  93 277  66 202 100 125 127 120
3304 178 241 178 500 241 200 905 200 905 337
3310 102 155 110 225 120  50 200 125  87  96
3316 156 200  67  84 101 158 105 208  84 113
3325  78 102 101 216 188 140  73  79 140  45
3336 117 153  90 176 203 132 211 226 220 277
3338  91 800 416 178 416 200 245 178 200 183
3345 202 186 500 130 101 104 298 150 147  90
3352 207 163  74 196  74 165 345 100 200 150
3365 117 130 101  56 149  83 127 139 175 120
3382 223 190 178 130 145 145 156  63  90 366
3433  85 209  50  80  91  52  80  95 140  42
3439  74  95 135  65  90  78 139  63  61 139
3451 146  40  58 115  62 130  58  60 140  76
3452  53  52 120 100  60  75  60  96  75  42
3454 160 115  87 415 170 170 165 159  75 160
3456 135  72  95  73  70  90  48 120  72  63
3461 120 135 208 300 191 154  54  90 400 117
3462 181 125  60  60 130 140 129 163  70 160
3473 120  75 107 256  90 142 100 140 125  73
3477  90 108  67 130  60 140 150 135 135 100
3478 138  25  67  65 208  74  71 140  85 265
3494 147  80  78 127 134  61 105 103 100  78
3513 160  99 127 106 100 105  75 150 122  78
3523 293 156 122 180  75 190  95 102 140 100
3525 211 124 157 115 110 105 125 165 250 315
3534 200 190 117 114 120 210 102 300 135 130
3556  78 230 201 185 333  60 170 120 200 174
3641 150 230 290 160 100 238 214 150 196  92
3645  71 144 140  95 115 111 143 197  52 125
3657 158  80  98  60  92  52  51 125  41  60
3674 150 178 158  95 169  80  72 105 120  55
3679 190 340 145 121 210 240 152 170 350  77
3691 100 129 208  69 250 120  97 263 110 100
3704 170 110  92  25 108 116  60 107  79 146
3709 130 125 234 300 213 190 121 150 251 225
3714  65 147 106  85  70  52 120  80 135 100
3717 110 100  92  93  70 127  81  75 105  92
3730  70  90  64 100 200 128 111 120  90  80
3740 176 122 200  53 120  55 175  73 184 116
3763  61  32 126 250 105  67  85  91 125  90
3768 126 175 200 112  98 176 102 350 152 129
3773 380 321 300 250 130 320 118 320 360 120
3794 124 176 179  70 206 400 250  92 125  79
3800 161  90 144  85 156  70  95 173  93  90
3823 185  76 100 150 160 128 163  96  60 114
3825  42  67 121  49  53  33  49  49  67  53
3850  95 132 175 245 134 233 152 117 250 142
3855 132 125 135 223 102  95 124 133 120 135
3857  77  40  70  35 102  80  70  68 162 162
3858  50  42 101  60  45 140  80  50 130 105
3882  61  93  92 128  68  60 240  98  80 132
3887 143 115 168 133 142 250 187  95 127 100
3892 303 110 205 190  83  90 170 150  90  89
3902 120 310 235 155 179 160 155 190  75 190
3914 171 141 180  90 142  82  85 110  88  95
3928 120 300  50 400 180 230 260 350 250 350
3932 112 100 129 111  64 260  80  91  65  62
3945 201  84  60 153 181  61 141 152 145 200
3946 380 236  65  99 199 390  85 212 169 145
3947  92  32  92  63 119  96  85  77  66 122
3951  85 170 183  98 100 140 135  80 125 147
3955 106  70 219  61  94 100 120 115  80 105
3966 118  50 117 136 102  66 410 105 143  93
3992 188  78  88 105 210 114 110 130  85 110
4003 159 250 250 112  72 139 120 110 187 139
4023 178 333 500 170 500 121 169 200 246  20
4036  85  95 100  78  64  87 169  70  82  69
4049  58  33  85 165  67  31  19 121  53  75
4064  98  96 167  50 120 100 196 250  86 107
4069 210 114 100  74 135 178 107  75 340 158
4076  85 165  72  53  88  68  45 121  33 121
4082 137 150 161 165  46 177 147  80 167 195
4085  50 214  78 159 114 212  40 200 231 182
4096 104 130 143 160 169 300 165 200 204 212
4119 200  79 103 120  70 155 161  55  80 102
4159 208 148 115 211 100  55 120 100 201 198
4168 150 127 150 118  96 100 134  90  60  70
4173 150  45  78  83  72  73 105  76 209  72
4181 145  78 107 117  78 102 125 204  74 108
4191 185  86 115 107 119 136 130 135 300  70
4198 411 250 428 250 325 360  97  87 300 250
4199 137 175 130  75 200  76 148  40 350 127
4222 105 219  92  99 195 160  67  92  92 106
4223  80 109  95 110 170  92 110  69  85 120
4237 227 100 100 200  88  90 125 120 208  60
4246 140 150  60 159 102  90 366  80  72  90
4247  60 113 200  60 120 175  65  72 113  70
4256 191 135 154 371 300 144  65 107 320 113
4281 110  65  75 175  95  80  25  92  63  73
4295 111  70 115  60  80 123  79 100  91  51
4333 192  42  55  90 202  73  67  63 105  90
4349  81 165 203  95  97 150 115 140 120 189
4368  70 109  90  67 180 163 114  75 109  92
4373 200  70 170  84 120 124  91 150 100  83
4398 110 110 144 120  90  90 169 120 274 199
4411 100 380 110 121 380 345 100 394 121  62
4420 241 500 491 150 905 500 150 491 905 905
4433 164 115 145 160  38 169 115 116 208  55
4436 136  80 128  70  98   8 185  77 149 168
4440 125 197 186 400 187 260 100  90 135 178
4441 100 155  71 289 195 535 260 100 320 159

Statistical Modeling

We can check the quality of the imputations by running a strip plot, which is a single axis scatter plot. It will show the distribution of each variable per imputed data set. We want the imputations to be values that could have been observed had the data not been missing.

Code
par(mfrow=c(7,2))
stripplot(Multiple_Imputation, Status, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Seniority, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Home, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Time, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Age, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Marital, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Records, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Job, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Expenses, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Income, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Assets, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Debt, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Amount, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Price, pch = 19, xlab = "Imputation number")

Next, we will pool the results of the complete dataset with the imputed dataset to arrive at estimates that will properly account for the missing data. We fit the complete model with the with() function and display the summary of the pooled results. It will give us the estimate, standard error, test statistic, degrees of freedom, and the p-value for each variable.

Code
# fit complete-data model
fit <- with(Multiple_Imputation, glm(Status ~ Seniority + Home + Time + Age + Marital + Records + Job + Expenses + Income + Assets + Debt + Amount + Price, family = binomial))

# pool and summarize the results
summary(pool(fit))
               term      estimate    std.error    statistic         df
1       (Intercept)  9.814751e-01 7.334980e-01   1.33807480 4300.24373
2         Seniority  8.312981e-02 7.467015e-03  11.13293702 4362.76288
3         Homeother  6.234888e-02 5.733104e-01   0.10875241 4423.75455
4         Homeowner  1.150920e+00 5.595613e-01   2.05682607 4419.16607
5       Homeparents  9.428102e-01 5.677649e-01   1.66056444 4425.40067
6          Homepriv  4.243125e-01 5.770735e-01   0.73528329 4414.59126
7          Homerent  4.218033e-01 5.624968e-01   0.74987674 4426.28471
8              Time -1.902847e-04 3.482487e-03  -0.05464046 4143.03176
9               Age -1.105297e-02 4.990176e-03  -2.21494640 4252.79778
10   Maritalmarried  6.024318e-01 4.197272e-01   1.43529369 4207.32624
11 Maritalseparated -6.781777e-01 4.640929e-01  -1.46129728 4250.10047
12    Maritalsingle  1.560922e-01 4.251030e-01   0.36718673 4231.26580
13     Maritalwidow  1.612457e-01 5.303590e-01   0.30403116 4103.30382
14       Recordsyes -1.783461e+00 1.025704e-01 -17.38768069 3912.89251
15     Jobfreelance -7.635440e-01 1.027294e-01  -7.43257641 3427.58461
16        Jobothers -7.027346e-01 2.031269e-01  -3.45958441 3940.95314
17       Jobpartime -1.472557e+00 1.258036e-01 -11.70520155 4411.06706
18         Expenses -1.505503e-02 2.646683e-03  -5.68826244 2690.31713
19           Income  7.120761e-03 8.203147e-04   8.68052331   86.96945
20           Assets  2.248416e-05 6.719016e-06   3.34634685  325.31247
21             Debt -1.693532e-04 3.847255e-05  -4.40192333  174.24479
22           Amount -1.929480e-03 1.717220e-04 -11.23606588 4090.81562
23            Price  8.709653e-04 1.261683e-04   6.90320427 4339.59729
        p.value
1  1.809428e-01
2  2.086587e-28
3  9.134038e-01
4  3.976153e-02
5  9.687180e-02
6  4.622060e-01
7  4.533688e-01
8  9.564275e-01
9  2.681655e-02
10 1.512778e-01
11 1.440078e-01
12 7.134981e-01
13 7.611196e-01
14 2.746080e-65
15 1.337976e-13
16 5.467464e-04
17 3.451846e-31
18 1.421792e-08
19 2.029941e-13
20 9.146867e-04
21 1.866292e-05
22 7.151376e-29
23 5.820776e-12

Conclusion

In conclusion, missing data can occur in research for a variety of reasons. It is never a good idea to ignore it. Doing this will lead to biased estimates of parameters, loss of information, decreased statistical power, and weak reliability of findings (Dong and Peng 2013). The best course of action is to impute the missing data by using multiple imputation. When missing data is discovered, it is important to first identify it and look for missing data patterns. Next, define the variables in the dataset that are related to the missing values that will be used for imputation. Create the necessary number of complete data sets. Run the models and combine them using the imputed values, and finally, analyze the complete dataset. Performing these steps will minimize the adverse effects caused by missing data on the anaylsis (Pampka, Hutcheson, and Williams 2016).

References

Arnab, R. 2017. Survey Sampling Theory and Applications. Academic Press. https://www.sciencedirect.com/topics/mathematics/imputation-method.
Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf. 2011. “Multiple Imputation by Chained Equations: What Is It and How Does It Work?” Int J Methods Psychiatr Res. 20 (1): 40–49. https://onlinelibrary.wiley.com/doi/epdf/10.1002/mpr.329.
Cao, Y., H. Allore, B. V. Wyk, and Gutman R. 2021. “Review and Evaluation of Imputation Methods for Multivariate Longitudinal Data with Mixed-Type Incomplete Variables.” Statistics in Medicine 41 (30): 5844–76. https://doi-org.ezproxy.lib.uwf.edu/10.1002/sim.9592.
Dong, Y., and C. J. Peng. 2013. “Principled Missing Data Methods for Researchers.” SpringerPlus 2 (222). https://doi.org/10.1186/2193-1801-2-222.
Mainzer, R., M. Moreno-Betancur, C. Nguyen, J. Simpson, J. Carlin, and K. Lee. 2023. “Handling of Missing Data with Multiple Imputation in Observational Studies That Address Causal Questions: Protocol for a Scoping Review.” BMJ Open 13: 1–6. http://dx.doi.org/10.1136/bmjopen-2022-065576.
Pampka, M., G. Hutcheson, and J. Williams. 2016. “Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation.” International Journal of Research & Method in Education 39 (1): 19–37. https://doi.org/10.1080/1743727X.2014.979146.
Pedersen, A. B., E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen. 2017. “Missing Data and Multiple Imputation in Clinical Epidemiological Research.” Clinical Epidemiology 9: 157–66. https://www.tandfonline.com/doi/full/10.2147/CLEP.S129785.
Schafer, J. L., and J. W. Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7 (2): 147–77. https://psycnet.apa.org/doi/10.1037/1082-989X.7.2.147.
Streiner, D. L. 2008. “Missing Data and the Trouble with LOCF.” EBMH 11 (1): 1–5. http://dx.doi.org/10.1136/ebmh.11.1.3-a.
Thongsri, T., and K. Samart. 2022. “Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data.” International Journal of Mathematics and Mathematics and Computer Science 17 (1): 51–62. http://ijmcs.future-in-tech.net/17.1/R-Samart.pdf.
van_Buuren, K., Groothuis-Oudshoorn. 2011. “Mice: Multivariate Imputation by Chained Equations in r.” Journal of Statistical Software 45: 1–67. https://doi.org/10.18637/jss.v045.i03.
van_Ginkel, J. R., M. Linting, R. C. Rippe, and A. van der Voort. 2020. “Rebutting Existing Misconceptions about Multiple Imputation as a Method for Handling Missing Data.” Journal of Personality Assessment 102 (3): 2812–31. https://doi.org/10.1080/00223891.2018.1530680.
Wongkamthong, C., and O. Akande. 2023. “A Comparative Study of Imputation Methods for Multivariate Ordinal Data.” Journal of Survey Statistics and Methodology 11 (1): 189–212. https://doi.org/10.1093/jssam/smab028.
Wulff, J. N., and L. E. Jeppesen. 2017. “Multiple Imputation by Chained Equations in Praxis: Guidelines and Review.” Electronics Journal of Business Research Methods 15 (1): 41–56. https://vbn.aau.dk/ws/files/257318283/ejbrm_volume15_issue1_article450.pdf.